Spatial data

Spatial data

  • Many natural processes take place in space

  • Spatial data are data which have any form of geographical information attached to them.

  • The emergence of modelling frameworks for spatial data in ecology and environmental sciences has been facilitated by the rise of new technologies.

Spatial data

The overall goal of any piece of spatial analysis is to understand the spatial patterns in our data. This could involve:

  • Estimating differences in mean, variance, or some other summary statistic over space.

  • Predicting the value at some unobserved location.

  • Identifying hotspots with high (or low) values compared to the rest of the region.

aims also vary with different types of spatial data…

Types of spatial data

Discrete space:

  • Data on a spatial grid (areal data)

Continuous space:

  • Geostatistical (geo-referenced) data

  • Spatial point pattern data

The components of the models we will cover are used to reflect spatial dependence structures in discrete and continuous space.

Spatial data structures

We can distinguish three types of spatial data structures

Areal data

Map of bird conservation regions (BCRs) showing the proportion of bird species within each region showing a declining trend

Geostatistical data

Scotland river temperature monitoring network

Point pattern data

Occurrence records of four ungulate species in Tibet,

Spatial data structures

We can distinguish three types of spatial data structures

Areal data

In areal data our measurements are summarised across a set of discrete, non-overlapping spatial units.

Map of bird conservation regions (BCRs) showing the proportion of bird species within each region showing a declining trend

Geostatistical data

Scotland river temperature monitoring network

Point pattern data

Occurrence records of four ungulate species in Tibet

Spatial data structures

We can distinguish three types of spatial data structures

Areal data

Map of bird conservation regions (BCRs) showing the proportion of bird species within each region showing a declining trend

Geostatistical data

In geostatistical data, measurements of a continuous process are taken at a set of fixed locations.

Scotland river temperature monitoring network

Point pattern data

Occurrence records of four ungulate species in Tibet,

Spatial data structures

We can distinguish three types of spatial data structures

Areal data

Map of bird conservation regions (BCRs) showing the proportion of bird species within each region showing a declining trend

Geostatistical data

Scotland river temperature monitoring network

Point pattern data

In point pattern data we record the locations where events occur (e.g. trees in a forest, earthquakes) and the coordinates of such occurrences are our data.

Occurrence records of four ungulate species in Tibet,

Spatial Scales

Scales describe the spatial and temporal dimensions at which an ecological or environmental process occurs. The spatial scale is often described by:

  • grain or spatial resolution: finest spatial unit of measurement for a given process
  • extent: total length or area of the study

There is typically a trade-off between the grain and the extent, mainly due to practicality.

The scale at which an ecological or environmental process occur can have a major impact on the interpretation of our analysis.

Modifiable Areal Unit Problem -MAUP

  • The spatial scale of analysis is crucial in ecological and environmental studies, as it can significantly influence the patterns we observe and the conclusions we draw.
  • Changing the grain of the study (while holding the extent constant) can introduce bias and uncertainty in the observed ecological patterns. E.g, when aggregating individual level data
  • modifiable areal unit relates to the assumption that a relationships exists one level of aggregation does not necessarily hold at another level of aggregation

Modifiable Areal Unit Problem -MAUP

  • The spatial scale of analysis is crucial in ecological and environmental studies, as it can significantly influence the patterns we observe and the conclusions we draw.
  • Changing the grain of the study (while holding the extent constant) can introduce bias and uncertainty in the observed ecological patterns. E.g, when aggregating individual level data
  • modifiable areal unit relates to the assumption that a relationships exists one level of aggregation does not necessarily hold at another level of aggregation

Modifiable Areal Unit Problem -MAUP

  • The spatial scale of analysis is crucial in ecological and environmental studies, as it can significantly influence the patterns we observe and the conclusions we draw.
  • Changing the grain of the study (while holding the extent constant) can introduce bias and uncertainty in the observed ecological patterns. E.g, when aggregating individual level data
  • modifiable areal unit relates to the assumption that a relationships exists one level of aggregation does not necessarily hold at another level of aggregation

Ecological fallacy

  • The assumption that a relationships exists one level of aggregation does not necessarily hold at another level of aggregation.
  • In ecology this is known as ecological fallacy, which arises when incorrect inferences about individual sample units are made based on aggregated
  • When individual-level data is grouped, the variability within units is lost, which can mask finer-scale patterns or exaggerate certain trends

Spatial modelling: why necessary?

Tobler’s first law of geography states that:

Everything is related to everything else, but near things are more related than distant things”

Standard models assume independent observations

  • Spatial data are often not independent

  • Two nearby observations are similar \(\rightarrow\) not providing independent info

Spatial models include special components to explicitly model dependence.

Spatial dependence

  • Inference and prediction are essential in ecology and conservation, but spatial dependency can complicate statistical analysis.

  • Understanding how spatial dependence structures impact our ecological analyses is crucial to address many ecological problems.

  • Spatial dependence may be caused by:

    1. Endogenous processes inherent to the system
      (e.g., localized dispersal leading to organism clustering or social and grouping behaviors)
    2. Exogenous factors such as spatially dependent environmental gradients used by the organism of interest
    3. Modelling mis-specifications, including the omission of key covariates or incorrect functional assumptions

Areal Data

Areal Data

  • Areal data are data which come from well-defined geographical units such as postcode areas, health boards, or pixels on a satellite image.

  • As with other types of spatial modelling, our goal is to observe and explain spatial variation in our data.

  • Again, we are trying to understand and account for spatial dependence.

  • Generally, we aim to produce a smoothed map that summarises the spatial patterns observed in our data.

Overview of Areal processes

An areal process (or lattice process) is a stochastic process defined on a set of regions that form a partition of our region of interest \(D\).

  • Let \(B_1, \ldots B_m\) be our set of \(m\) distinct regions such that: \[\bigcup\limits_{i=1}^m \hspace{1mm}B_i = D.\]

  • Here we require that our regions are non-overlapping, with \[B_i \cap B_j = \emptyset.\]

  • Then our areal process is simply the stochastic process \[\{Z(B_i); i=1,\ldots,m\}.\]

How do we model this?

Imagine we have animal counts in each region. We can model them as Poisson

\[ y_i \sim \mathrm{Poisson}(e^{\eta_i}) \] How do we model the linear predictor \(\eta_i\)?

  • We could model the number of animals in each region independently

\[ \eta_i = \beta_0 + \mathbf{x}'\beta + u_i ~~~ u_i \overset{iid}{\sim} N(0,\sigma^2_i) \]

  • regional differences are accounted for through a random effect

How do we model this?

Imagine we have animal counts in each region. We can model them as Poisson

\[ y_i \sim \mathrm{Poisson}(e^{\eta_i}) \] How do we model the linear predictor \(\eta_i\)?

  • But… what if the distribution varies across space, i.e. is structured in space?
  • Do the covariates account for those structures?
  • If there’s an area where the animal is rare, we’ll get lots of zero counts

How do we model this?

Imagine we have animal counts in each region. We can model them as Poisson

\[ y_i \sim \mathrm{Poisson}(e^{\eta_i}) \] How do we model the linear predictor \(\eta_i\)?

  • We could model some dependence across regions:
    • Nearby regions should have similar counts

\[ \eta_i = \beta_0 + \mathbf{x}'\beta + u_i ~~~ u_i \overset{iid}{\sim} N(0,\Sigma) \]

  • Now the random effect \(u_i \sim N(0, \Sigma)\) is correlated.
  • How do we do this?

Sparse Matrixes

\[ \mathcal{u}\sim\mathcal{N}(0,\Sigma) \]

  1. Force the covariance matrix \(\Sigma\) to be sparse

  2. Force the precision matrix \(\Sigma^{-1}\) to be sparse

Sparse Matrixes

\[ \mathcal{u}\sim\mathcal{N}(0,\Sigma) \]

  1. Force the covariance matrix \(\Sigma\) to be sparse

    • \(\Sigma_{ij} = \text{Cov}(u_i, u_j)\) : Covariance between \(u_i\) and \(u_j\)
    • \(\Sigma_{ij} = 0\) \(\longrightarrow\) \(u_i\) and \(u_j\) are independent
    • A sparse covariance matrix implies that many elements of \(\mathbf{u}\) are mutually independent…..is this desirable?
  2. Force the precision matrix \(\mathbf{Q} = \Sigma^{-1}\) to be sparse

Sparse Matrixes

\[ \mathcal{u}\sim\mathcal{N}(0,\Sigma) \]

  1. Force the covariance matrix \(\Sigma\) to be sparse

    • \(\Sigma_{ij} = \text{Cov}(u_i, u_j)\) : Covariance between \(u_i\) and \(u_j\)
    • \(\Sigma_{ij} = 0\) \(\longrightarrow\) \(u_i\) and \(u_j\) are independent
    • A sparse covariance matrix implies that many elements of \(\mathbf{u}\) are mutually independent…..is this desirable?
  2. Force the precision matrix \(\mathbf{Q} = \Sigma^{-1}\) to be sparse

    • What does \(Q_{ij}\) represents?
    • What does a sparse precision matrix implies?

Example: The AR(1) proces

Definition

\[ \begin{aligned} \mathbf{i=1}&: x_1 \sim N\left(0, \frac{1}{1-\phi^2}\right)\\ \mathbf{i=2,\dots,T}&: x_i = \phi\ x_{i-1} +\epsilon_i,\ \epsilon_i\sim\mathcal{N}(0,1) \end{aligned} \]

  • Very common to model dependence in time
  • The joint distribution of \(\mathbf{x}=x_1,x_2,\dots\) is Gaussian
  • How do covariance and precision matrices look?

Covariance and Precision Matrix for AR1

Covariance Matrix

\[ \Sigma = \frac{1}{1-\phi^2} \begin{bmatrix} 1& \phi & \phi^2 & \dots& \phi^N \\ \phi & 1& \phi & \dots& \phi^{N-1} \\ \phi^2 & \phi & 1 & \dots& \phi^{N-2} \\ \dots& \dots& \dots& \dots& \dots& \\ \phi^{N} & \phi^{N-1}& \phi^{N-2} & \dots& 1\\ \end{bmatrix} \]

  • This is a dense matrix.

  • All elements of the \(\mathbf{x}\) vector are dependent.

Covariance and Precision Matrix for AR1

Precision Matrix

\[ \mathbf{Q} = \Sigma^{-1} = \begin{bmatrix} 1& -\phi & 0 & 0 &\dots& 0 \\ -\phi & 1 + \phi^2& -\phi & 0 & \dots& 0 \\ 0 & -\phi & 1-\phi^2 &-\phi & \dots& 0 \\ 0 & 0 & -\phi &1-\phi^2 & \dots & \dots \\ \dots& \dots& \dots& \dots& \dots& \dots& \\ 0 &0 & 0 & \dots & -\phi& 1\\ \end{bmatrix} \]

  • This is a tridiagonal matrix, it is sparse

  • The tridiagonal form of \(\mathbf{Q}\) can be exploited for quick calculations.

What is the key property of this example that causes \(\mathbf{Q}\) to be sparse?

Conditional independence

The key lies in the full conditionals

\[ x_t|\mathbf{x}_{-t}\sim\mathcal{N}\left(\frac{\phi}{1-\phi^2}(x_{t-1}+x_{t+1}), \frac{1}{1+\phi^2}\right) \]

  • The circles represent the values of \(x\) at individual time points

  • There is a line between them if they are conditionally dependent

  • Each timepoint is only conditionally dependent on the two closest neighbours

  • The nonzero pattern in the precision matrix is given by the neighborhood structure of the process

Conditional autoregressive models

Markov in Space:

First order conditional autoregressive model or a CAR(1) model.

  • Every node is conditionally dependent on its four nearest neighbours

  • Models based on neighbourhood have a name in statistics: they are Markovian models

  • Markovian models are specified entirely through “neighbourhood structures

  • Recall our first Model:

    \[ \begin{aligned} y_i &\sim \mathrm{Poisson}(e^{\eta_i}) \\ \eta_i &= \beta_0 + \mathbf{x}'\beta + u_i ~~~ u_i \sim N(0,Q^{-1}) \end{aligned} \]

  • This mean \(u_i\) is independent of all the other parameters \(\mathbf{u}_{-i}\), given the set of its neighbors.

  • for any pair of elements (\(i,j\)) in \(\mathbf{u}\), \(u_i\perp u_j|\mathbf{u}_{-ij}\Longleftrightarrow Q{ij} =0\)

  • \(Q{ij} \neq 0\) only if \(j\in \{i,\mathcal{N}(i)\}\)

(Informal) definition of a GMRF

Gauss Markov random field (GMRF) is

  • a Gaussian distribution where the non-zero elements of the precision matrix are defined by a neighbourhood matrix (or graph structure)
  • each region conditionally has a Gaussian distribution with
  • mean equal to the average of the neighbours and
  • precision proportional to the number of neighbours

GMRFs are key to the many inferential approaches:

  • computationally efficient
  • other data structures can be approximated by a GMRF in a clever way

Formal definition of a GMRF

Let \(\mathbf{u}\) be a GMRF wrt \(\mathcal{G} = (\mathcal{V},\mathcal{E})\)).

\(\mathcal{V}\) Vertices: \(1,2,\ldots,n\). \(\mathcal{E}\) Edges \(\{i,j\}\)

  • No edge between \(i\) and \(j\) if \(u_i \perp u_j \mid \mathbf{u}_{ij}.\)

  • An edge between \(i\) and \(j\) if \(u_i \not\perp u_j \mid \mathbf{u}_{ij}.\)

Key point: A graph defines the sparsity structure of Q

Definition

A random variable \(\mathbf{u}\) is said to be a GMRF wrt to the graph \(\mathcal{G}\) with vertices \(\{1, 2,\dots , n\}\) and edges \(\mathcal{E}\) , with mean \(\mu\) and precision matrix \(\mathbf{Q}\) if its probability distribution is given by

\[ \mathbf{u} \sim N(\mu,Q^{-1}) \] and

\(Q_{ij} \neq 0\Longleftrightarrow \{i,j\}\in\mathcal{E} ~~\forall~~i\neq j\)

Modelling spatial similarity

An example of a GMRF is the the Besag model a.k.a. Intrinsic Conditional Autoregressive (ICAR) model. The conditional distribution for \(u_i\) is

\[ u_i|\mathbf{u}_{-i},\tau_u, \sim N\left(\frac{1}{d_i}\sum_{j\sim i}u_j,\frac{1}{d_i\tau_u}\right) \]

  • \(\mathbf{u}_{-i} = (u_i,\ldots,u_{i-1},u_{i+1},\ldots,u_n)^T\)

  • \(\tau_u\) is the precision parameter (inverse variance).

  • \(d_i\) is the number of neighbours

  • The mean of \(u_i\) is equivalent to the the mean of the effects over all neighbours, and the precision is proportional to the number of neighbours (e.g., if an area has many neighbors then its variance will be smaller)

Modelling spatial similarity

The joint distribution is given by:

\[ \mathbf{u}|\tau_u \sim N\left(0,\frac{1}{\tau_u}Q^{-1}\right), \]

Where \(Q\) denotes the precision matrix defined as

\[ Q_{i,j} = \begin{cases} d_i, & i = j \\ -1, & i \sim j \\ 0, &\text{otherwise} \end{cases} \]

This structure matrix directly defines the neighbourhood structure and is sparse.

How does it looks?

Larynx cancer relative risk map

How does it looks?

Connecting all the neighbouring areas give the following graph

How does it looks?

  • Let us focus on one small part of the graph

How does it looks?

  • We apply an ICAR model where each region conditionally has a Gaussian distribution with mean equal to the average of the neighbours and a precision proportional to the number of neighbour

    \[ x_9\mid\mathbf{x}_{-9}\sim N\left(\frac{1}{6}(x_7+x_{11}+x_{12}+x_{13}+x_{14}+x_{15},\frac{1}{6\tau}\right) \]

How does it looks?

The sub graph leads to a precision matrix with 21.6% non-zero elements.

How does it looks like?

The full graph leads to a precision matrix with 0.1% non-zero elements.

  • How do we do choose the neighborhood structure?

Neighbourhood structures

  • Each of our regions \(B_i\) has a set of other nearby which can be considered neighbours

  • We might expect that areas have more in common with their neighbours.

  • Therefore, we can construct dependence structures based on the principle that neighbours are correlated and non-neighbours are uncorrelated.

  • However, we need to come up with a sensible way of defining what a neighbour is in this context.

Defining a Neighbourhood

  • There are many different ways to define a region’s neighbours.

  • The most common ones fall into two main categories - those based on borders, and those based on distance.

Common borders

Assume that regions which share a border on a map are neighbours.

  • Simple and easy to implement

  • Treats all borders the same, regardless of length, which can be unrealistic.

  • Areas very close together are not neighbours if there is even a small gap between them.

Distance-based

Assume that regions which are a within a certain distance of each other area neighbours.

  • What distance do you choose? How do you decide that?

  • Where do you measure from? (e.g., nearest border or a central point).

Neighbourhood matrix

  • Once we have identified a set of neighbours using our chosen method, we can use this to account for spatial correlation.

  • We construct a neighbourhood matrix (or proximity matrix), which defines how each of our \(m\) regions relate to each other.

  • Let \(W\) denote an \(m \times m\) matrix where the \((i,j)\)th entry, \(w_{ij}\) denotes the proximity between regions \(B_i\) and \(B_j\).

    Note

    The values of this matrix can be discrete (which regions are neighbours) or continuous (how far apart are the regions).

Binary Neighbourhood matrix

By far the most common approach is to use a binary neighbourhood matrix, \(W\), denoted by

\[ \begin{aligned} w_{ij} &= 1 \hspace{2mm} \mbox{ if areas} (B_i, B_j) \mbox{ are neighbours.}\\ w_{ij} &= 0 \hspace{2mm} \mbox{ otherwise.} \end{aligned} \]

  • Dependencies structures are described through this spatial weights matrix

  • Binary matrices are used for their simplicity

Measuring spatial correlation

How?

  • Now that we have defined a measure of spatial proximity for areal data, we can use this to assess spatial dependence.

  • Essentially, we can now ask the question of how similar a region is to its neighbours.

  • We can consider global correlation, measured across the entire region, and local correlation which allows for regional variation.

  • In this course, we will focus on modelling global autocorrelation using Moran’s I.

Moran’s \(I\)

  • Moran’s \(I\) is a measure of global spatial autocorrelation and can be considered an extension of the Pearson correlation coefficient.

  • For a set of data \(Z_1, \ldots, Z_m\) measured on regions \(B_1, \ldots, B_m\), with neighbourhood matrix \(W\), Moran’s \(I\) is given by:

\[ I = \frac{m}{\sum_{i=1}^m \sum_{j=1}^m w_{ij}} \frac{\sum_{i=1}^m \sum_{j=1}^m w_{ij}(Z_i - \bar{Z})(Z_j - \bar{Z})} {\sum_{i=1}^m (Z_i - \bar{Z})^2} \]

  • This is essentially a function of differences in values between neighbouring areas.

Moran’s \(I\)

  • Moran’s \(I\) ranges between \(-1\) and \(1\) and can be interpreted similarly to a standard correlation coefficient.

  • \(I = 1\) implies perfect spatial correlation.

  • \(I = 0\) implies complete spatial randomness.

  • \(I = -1\) implies perfect dispersion (negative correlation).

  • Our observed \(I\) is a point estimate, and we may also wish to assess whether it is significantly different from zero.

Assessing significance

  • We can test for statistically significant spatial correlation using a permutation test, with hypotheses:

\[ \begin{aligned} H_0 &: \text{no spatial association } (I = 0) \\ H_1 &: \text{some spatial association } (I \neq 0) \end{aligned} \]

  • We carry out \(k\) random permutations of our data (reassigning each data value to a random location) and compute Moran’s \(I\) for each permutation
    \((I_{\text{perm}} = I_1, \ldots, I_k)\).

  • We reject the null hypothesis if the observed Moran’s \(I\) (\(I_{\text{obs}}\)) could not plausibly have arisen from the distribution of \(I_{\text{perm}}\).

Examples

Perfect dispersion

\[ \begin{aligned} \text{Estimate} &\ -1 \\ \text{St. Dev.} &\ 0.095 \\ \text{p-value} &< 0.05 \end{aligned} \]

No spatial correlation

\[ \begin{aligned} \text{Estimate} &\ 0.001 \\ \text{St. Dev.} &\ 0.095 \\ \text{p-value} &\ 0.865 \end{aligned} \]

High correlation

\[ \begin{aligned} \text{Estimate} &\ 0.865 \\ \text{St. Dev.} &\ 0.095 \\ \text{p-value} &< 0.05 \times 10^{25} \end{aligned} \]

Fitting a CAR model

Suppose we identified spatial autocorrelation in our data… what do we do next?

  • Fitting autoregressive models to non-normal data—such as presence/absence responses—can be challenging due to the lack of simple distributional assumptions.

  • CAR models are often implemented within a Bayesian framework, which provides a flexible approach for handling spatially structured ecological & environmental data.

  • A key advantage of Bayesian hierarchical modelling is its ability to properly incorporate uncertainty and facilitate inference in complex spatial models.

A slightly detour to Bayesian Statistics

Bayesian Inference

  • In the Bayesian framework all unknown quantities in the model are treated as random variables, and the aim is to estimate the joint posterior distribution of the unknown parameters \(\theta\) given the observed data \(\mathbf{y}\).

  • We obtain this distribution through Bayes’ theorem:

\[ \pi(\theta \mid \mathbf{y}) = \frac{\pi(\mathbf{y} \mid \theta)\pi(\theta)}{\pi(\mathbf{y})} \]

  • Components:

    • \(\pi(\mathbf{y} \mid \theta)\) is the likelihood of \(\mathbf{y}\) given parameters \(\theta\)
    • \(\pi(\theta)\) is the prior distribution of the parameters
    • \(\pi(\mathbf{y})\) is the marginal likelihood (normalizing constant):

    \[ \pi(\mathbf{y}) = \int_\Theta \pi(\mathbf{y} \mid \theta) \pi(\theta) d\theta \]

    • marginalizing \(\pi(\mathbf{y})\) means integrating out all the uncertainty on \(\theta\)

Bayesian Inference (Computation)

  • When the posterior distribution lacks a closed-form solution, computational methods must be used to approximate it.
  • To efficiently estimate spatially structured spatial models, particularly those incorporating Conditional Autoregressive (CAR) priors, the Integrated Nested Laplace Approximation (INLA) framework (Van Niekerk et al., 2023) provides a powerful alternative to traditional Markov chain Monte Carlo (MCMC) methods.

posterior A Prior belief C Bayes Theorem & Bayesian Computations A->C B Observation model B->C D Posterior distribution C->D

\[ \color{purple}{\pi(\mathbf{u},\theta|\mathbf{y})}\propto \color{#FF6B6B}{\pi(\mathbf{y}|\mathbf{u},\theta)}\color{#0066CC}{\pi(\mathbf{u}|\theta)\pi(\theta)} \]

The INLA Framework

What is INLA?

The short answer:

INLA is a fast and accurate method to do Bayesian inference with latent Gaussian models.

The INLA Framework

What is INLA?

The (much) longer answer:

  • Rue, H., Martino, S. and Chopin, N. (2009), Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71: 319-392.
  • Van Niekerk, J., Krainski, E., Rustand, D., & Rue, H. (2023). A new avenue for Bayesian inference with INLA. Computational Statistics & Data Analysis, 181, 107692.
  • Lindgren, F., Bachl, F., Illian, J., Suen, M. H., Rue, H., & Seaton, A. E. (2024). inlabru: software for fitting latent Gaussian models with non-linear predictors. arXiv preprint arXiv:2407.00791.
  • Lindgren, F., Bolin, D., & Rue, H. (2022). The SPDE approach for Gaussian and non-Gaussian fields: 10 years and still running. Spatial Statistics, 50, 100599.

The INLA Framework

INLA can be used with Bayesian hierarchical models where we model in different stages or levels:

  • Stage 1: What is the distribution of the responses?

  • Stage 2: What is the distribution of the underlying latent components?

  • Stage 3: What are our prior beliefs about the hyperparameters?

Stage 1: Data Generating Process

On Stage 1, we look at how is our data (\(\mathbf{y}\)) generated from the underlying components \(\mathbf{x}\) and hyperparameters \(\theta\) in the model

Response types:

  • Gaussian (temperature, rainfall, weight)

  • Count data (disease cases, species counts)

  • Point patterns (tree locations)

  • Binary data (yes/no responses)

  • Survival data (time to event)

It is also important how data are collected!

All of this information is placed into our likelihood \(\pi(\mathbf{y}|\theta)\).

Stage 1: Data Generating Process

We assume that given the underlying components (\(\mathbf{x}\)) and hyperparameter (\(\theta\)), the data are independent on each other

\[ \pi(y|x,\theta) = \prod_{i\in\mathcal{I}}\pi(y_i|\mathbf{x}_{\mathcal{I}},\theta) \]

This implies that all the dependence structure in the data is explained on Stage 2.

Stage 2: Dependence Structure

The underlying unobserved components \(\mathbf{x}\) are called latent components and can be:

  • Fixed effects of covariates

  • Unstructured random effects (individual effects, group effects)

  • Structured random effects (AR(1), regional effects, \(\ldots\) )

These are linked to the responses in the likelihood through linear predictors.

Stage 2: Dependence Structure

  1. The latent field \(\mathbf{u}\) is a GMRF \[ \pi(\mathbf{u}|\theta)\propto\exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right\} \]

    • The precision matrix \(\mathbf{Q}\) is sparse.

Stage 2: Dependence Structure

  1. The latent field \(\mathbf{u}\) is a GMRF \[ \pi(\mathbf{u}|\theta)\propto\exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right\} \]

  2. We can factorize the likelihood as: \(\pi(\mathbf{y}|\mathbf{u},\theta) = \prod_i\pi(y_i|\eta_i,\theta)\)

    • Data are conditional independent give \(\mathbf{u}\) and \(\theta\)
    • Each data point depends on only 1 element of the latent field: the predictor \(\eta_i\)
    • \(\eta\) is a linear combination of other elements of \(\mathbf{u}\): \(\eta = \mathbf{A}^T\mathbf{u}\)

Stage 2: Dependence Structure

  1. The latent field \(\mathbf{u}\) is a GMRF \[ \pi(\mathbf{u}|\theta)\propto\exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right\} \]

  2. We can factorize the likelihood as: \(\pi(\mathbf{y}|\mathbf{u},\theta) = \prod_i\pi(y_i|\eta_i,\theta)\)

  3. The vector of hyperparameters \(\theta\) is low dimensional.

Stage 3: Hyperparameters

The likelihood and the latent model typically have hyperparameters that control their behavior.

The hyperparameters \(\theta\) can include:

Examples likelihood:

  • Variance of observation noise

  • Dispersion parameter in the negative binomial model -

  • Probability of a zero (zero-inflated models)

Examples latent model:

  • Variance of unstructured effects

  • Correlation of multivariate effects

  • Range and variance of spatial effects

  • Autocorrelation parameter \(\rho\)

Latent Gaussian Models (LGMs)

These three stages constitute the basis of latent Gaussian models (LGMs).

  • A LGM consists of three elements:
    1. a likelihood model
    2. a latent Gaussian field (i.e., the latent components of our model)
    3. a vector of non-Gaussian hyperparameters
  • The characteristic property is that the latent part of the hierarchical model is Gaussian, \(\mathbf{u}|\theta \sim N(0,Q^{-1})\), where the expected value is \(0\) and the precision matrix is \(Q\).

INLA strategy in short

Assumption

We are mainly interested in posterior marginals \(p(u_i|\mathbf{y})\) and \(p(\theta_j|\mathbf{y})\)

Strategy

  • We use numerical integration in a smart way
  • We approximate what we do not know analytically exploiting the Gaussian structure of \(\mathbf{u}|y\).

Case Study: Lip cancer rates in Scotland

Example: Lip cancer rates in Scotland

In this example we model the number of lip cancer rates in Scotland in the years 1975–1980 at the county level in order to evaluate the presence of an association between sun exposure and lip cancer.

In epidemiology, disease risk is assessed using Standardized Mortality Ratios (SMR):

\[ SMR_i = \dfrac{Y_i}{E_i} \]

  • A value \(SMR > 1\) indicates a high risk area.
  • A value \(SMR<1\) suggests a low risk area.
  • SIRs may be misleading in counties with small populations.
  • model-based approaches enable to incorporate covariates and borrow information from neighboring counties to improve local estimates,

Do we need to model spatial dependence?

Recall that, Moran’s \(I\) ranges between -1 and 1, and can be interpreted in a similar way to a standard correlation coefficient.

  • \(I=1\) implies that we have perfect spatial correlation.

  • \(I=0\) implies that we have complete spatial randomness.

  • \(I=-1\) implies that we have perfect dispersion (negative correlation).

Our observed \(I\) is a point estimate, and we may also wish to assess whether it is significantly different from zero. We can test for a statistically significant spatial correlation using a permutation test, with hypotheses:

\[ \begin{aligned} H_0&: \text{ negative or no spatial association } (I \leq 0)\\ H_1&: \text{ positive spatial association } (I > 0) \end{aligned} \]

Do we need to model spatial dependence?

We can use moran.test() to test this hypothesis by setting alternative = "greater". To do so, we need to supply list containing the neighbors via the nb2listw() function from the spdep package. Lets assess now the spatial autocorrelation of the SMR in 2011:


    Moran I test under randomisation

data:  scotland_sf$SMR  
weights: R  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 5.3678, p-value = 3.984e-08
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.437119239      -0.019230769       0.007227652 

Since have set the alternative hypothesis to be \(I > 0\) and have a p-value \(<0.05\), we then reject the null hypothesis and conclude there is evidence for positive spatial autocorrelation.

BYM model

The ICAR model accounts only for spatially structured variability and does not include a limiting case where no spatial structure is present.

  • We typically add an unstructured random effect \(z_i\mid \tau_z \sim N(0,\tau_{z}^{-1})\)

  • The resulting model \(v_i = u_i + z_i\) is known as the Besag-York-Mollié model (BYM)

  • The structured spatial effect is controlled by \(\tau_u\) which control the degree of smoothing:

    • Higher \(\tau_u\) values lead to stronger smoothing (less spatial variability).

    • Lower \(\tau_u\) values allow for greater local variation.

Example: Lip cancer rates in Scotland

  • Stage 1: We assume the responses are Poisson distributed: \[ \begin{aligned}y_i|\eta_i & \sim \text{Poisson}(E_i\lambda_i)\\\text{log}(\lambda_i) = \color{#FF6B6B}{\boxed{\eta_i}} & = \color{#FF6B6B}{\boxed{\beta_0 + \beta_1 \mathrm{AFF} + u_i + z_i} }\end{aligned} \]
  • Stage 2: \(\eta_i\) is a linear function of four components: an intercept, proportion of population engaged in agriculture, fishing, or forestry (AFF) effect , a spatially structured effect \(u\) and an unstructured iid random effect \(z\): \[ \eta_i = \beta_0 + \beta_1\mathrm{AFF}_i + u_i + z_i \]
  • Stage 3: \(\{\tau_{z},\tau_u\}\): Precision parameters for the random effects

The latent field is \(\mathbf{x}= (\beta_0, \beta_1, u_1, u_2,\ldots, u_n,z_1,...)\), the hyperparameters are \(\boldsymbol{\theta} = (\tau_u,\tau_z)\), and must be given a prior.

Example: Lip cancer rates in Scotland

The Model

\[ \begin{aligned} y_i|\eta_t & \sim \text{Poisson}(E_i\lambda_i)\\ \text{log}(\lambda_i) = \eta_i & = \beta_0 + \beta_1 \mathrm{AFF}_i + u_i + v_i \end{aligned} \]

  • \(\beta_0\) is the intercept that represents the overall risk

  • \(\beta_1\) is the coefficient of the AFF covariate

  • \(u_i\) is a spatial structured component$

  • \(v_i\) is a spatial unstructured component

neighbourhood structure

  • The precision matrix \(Q\) depends on \(\tau_u\) and the neighboring structure

  • To make the precision parameters of models with different intrinsic Gaussian random field comparable we add a sum-to-zero constrain \(\sum_i^n u_i = 0\).

Example: Lip cancer rates in Scotland

cmp = ~ Intercept(1) + beta_1(AFF, model = "linear") +
  u_i(region_id, model = "besag", graph = Q,scale.model = TRUE) + 
  z_i(region_id , model = "iid")

formula = cases ~ Intercept + beta_1 + u_i + z_i

lik = bru_obs(formula = formula, 
              family = "poisson",
              E = expected,
              data = scotland_sf)

fit = bru(cmp, lik)
INLA Model Results
Posterior summaries of fixed effects and hyperparameters
Parameter Mean 2.5% Quantile 97.5% Quantile
Intercept −0.306 −0.538 −0.069
beta_1 4.317 1.758 6.761
Precision for u_i 4.139 2.023 7.597
Precision for z_i 22,037.716 1,474.245 86,082.203
  • The model revealed a strong positive association between sun exposure and lip cancer risk

Compare Relative Risk against SIR

The relative risks \(\lambda_i\) can be obtained as follows:

\[ \begin{aligned} \lambda_i &= \exp\left(\eta_i\right) \\ &= \exp\Bigl(\beta_0 + \beta_1 \times \mathrm{AFF}_i + u_i + z_i\Bigr) \end{aligned} \] We see RR values are shrunk towards 1 compared to the SMR values because each area’s spatial effect is pulled toward its neighbors’ average.

BYM2 Model

In the original BYM, the spatially structured component must be scaled so that \(\tau_u\) produces consistent smoothness across different neighborhood structures.

Simpson et al. (2017) proposed a new parametrization of the BYM model that improves parameter interpretability.

\[ \mathbf{b} = \dfrac{1}{\sqrt{\tau_b}} \left(\sqrt{1-\phi}v^*+\sqrt{\phi}u^*\right). \]

  • The precision \(\tau_b>0\) controls the marginal variance contribution of the weighted sum \(u^*\) and \(v^*\).

  • The mixing parameter \(0 \leq \phi \leq 1\) measures the proportion of the marginal variance explained by the structured effect \(u^*\)

    • if \(\phi =1\) the model captures only spatially structured variability

    • if \(\phi = 0\) it accounts solely for unstructured spatial noise.

PC priors

The BYM2 model allows the specification of Penalized Complexity (PC) priors to control the amount of spatial smoothing and avoid overfitting.

  • PC priors shrink the model towards a simpler baseline unless the data provide strong evidence for a more complex structure.
  • To define the prior for the marginal precision \(\tau_b\) and the mixing parameter \(\phi\), we use the probability statements:

\[ \begin{aligned} P(1/\sqrt{\tau_b} > U) &= \alpha\\ P(\phi < U) &= \alpha \end{aligned} \]

\[ \begin{aligned} P((1/\sqrt{\tau_b}) > 0.5/0.31) &\equiv P(\sigma > 1.61) = 0.01\\ P(\phi < 0.5) &= 2/3 \approx 0.66 \end{aligned} \]

Interpretation:

  • Prior on \(\tau_b\): low probability of large values

  • Prior on \(\phi\) assumes that the unstructured random effect accounts for more of the variability than the spatially structured effect.

Summary of points

  • Space is inherent to many ecological and environmental phenomena

  • We can distinguish between three main types of spatial data:

    • Areal data - Data summaried across discrete non-overlapping regions

    • Geostatistical - Measurements of a continuous process measured at fixed location

    • Point process data - locations where events occur as realization of a continuous process

  • The spatial scale defined by the grain and extent determine the conclusion we can drawn from our analysis (MAUP and ecological fallacy).

  • Spatial data are often not independent \(\rightarrow\) spatial dependency can complicate statistical analysis.

Summary of points

  • Incorporate spatial dependence through sparse precision matrices

  • Spatial dependence can be incorporated through Gaussian Markov random fields (GMRF

  • GMRF conditionally dependent structures enforces sparsity into the precision matrix.

  • Besag mode ( or ICAR) is a GMRF used to model spatial dependence in areal data based on a neighborhood structure.

  • Different way of defining the neighborhood structure and matrices.

Summary of points

  • CAR models can be fitted using Bayesian methods

  • INLA is an efficient Bayesian efficient approximation methods for LGMs.

  • LGMs are defined based on:

    • Model likelihood \(p(y_i|u_i\theta)\)

    • Latent GMRF \(p(\mathbf{u}\mid \theta)\).

    • Vector of hyper parameter \(p(\theta)\)

  • Use Moran’s I to assess spatial dependence.

  • Use BYM to extend ICAR model to account for unstructured spatial variation.

  • BYM2 model provides further flexibility and interpretability

  • PC priors enable intuitive construction of prios based on probability statements.